Why should we use R Markdown? It provides a unified framework for authoring documents that contain the code, the output of the code, and any additional commentary or text that we would want to include. It’s written in a simple markdown language (this is what the md stands for). This requires a smooth learning curve, which is not too hard to pick up with a little practice. With this single document we can:
I can’t emphasize enough how much of a difference using R Markdown has made to my own workflow, in making my code more efficient (in conjunction with tidyverse) and in simplifying my own ability to look back on what I did.
In my own pre-R markdown/tidyverse world, I used to work with the script editor. This meant that my script was pretty clunky, with lots of commented code (non-executable code) as well as many intermediate steps. In other words, I had to save intermediate data frames to check on my work. Now I can accomplish this all in a single document.
Take a look at my prior workflow.
I would write code where it was hard to visualize where the different sections occured. {width = 20%}
Then, I would have a very messy file directory with a whole bunch of intermediate files. {width = 20%}
Now, everything is much more simplified. {width = 20%} Including my file directory:
{width = 20%}
So I really want to highly encourage you to use the R Markdown funcitonality even beyond my class. It’s essentially like keeping a lab notebook, which can easily be converted into a final product. In fact, it’s possible to write-up journal article submissions and even conference presentations all within R Studio. These are obviously more advanced topics, but it’s something you can look forward to as you progress with your own skillset!
Open R Studio and go to File > New File > R Markdown. You’ll see the Source pane take over (the Console will become hidden) and you’ll see a simple (nearly) blank template. You can also try R Notebook. Go ahead and open an R Notebook file and compare the differences across both templates.
What you now have is an R Markdown document. When you execute code within the document, the results appear beneath the code. There are three important types of content.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd/CTRL+Shift+Enter.
plot(cars)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd/CTRL+Option/ALT+I. And add some simple arithmetic operations. Then run your code
2 + 2
## [1] 4
20 - 2
## [1] 18
Notice that the results are printed right below. Let’s go ahead and render this document. Select the Knit button above and notice what happens. You get an html version of your document as well as any other output that you ask for (e.g., word, pdf).
When you save an R notebook, an HTML file containing the code and output will be saved alongside it (If using an R notebook, click the Preview button or press Cmd/CTRL+Shift+K to preview the HTML file.The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.)
Here are some text formatting aspects that can be done within the text portion of the notebook:
italic or italic bold or bold code superscript2 subscript2
| First | Second |
|---|---|
| a | b |
| c | d |
You can write in-line equations using \(\LaTeX\) formatting: \(SEM = \sigma/\sqrt n\).
Create a horizontal rule:
Create a block quote:
Important stuff was said
Use CMD/CTRL+OPTION/ALT+i to insert a code chunk:
a <- 3
b <- 4
c <- a*b
c
## [1] 12
You can give an R chunk a name by adding above within {} after the r. Use CMD/CTRL+SHIFT+ENTER to run code chunk. Results will appear after chunk in notebooks!
These are in my opinion, the most important shortcut keys to remember in R:
<-%>% very important for tidyverselibrary(tidyverse) #to load in tidyverse library
## ── Attaching packages ───────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
#create a tibble of fake data
fake_data <- tibble(
Subject = rep(seq(1:10),2), #repeat a sequence from 1 to 10, twice
Condition = c(rep("easy", 10), rep("hard", 10)),
Fake_RT = c(rnorm(10, mean = 250, sd = 2.5), rnorm(10, mean = 294, sd = 2.5))
)
print(fake_data)
## # A tibble: 20 x 3
## Subject Condition Fake_RT
## <int> <chr> <dbl>
## 1 1 easy 247.
## 2 2 easy 249.
## 3 3 easy 253.
## 4 4 easy 254.
## 5 5 easy 253.
## 6 6 easy 249.
## 7 7 easy 250.
## 8 8 easy 249.
## 9 9 easy 253.
## 10 10 easy 249.
## 11 1 hard 294.
## 12 2 hard 293.
## 13 3 hard 295.
## 14 4 hard 292.
## 15 5 hard 295.
## 16 6 hard 290.
## 17 7 hard 292.
## 18 8 hard 293.
## 19 9 hard 292.
## 20 10 hard 293.
Now let’s do something very simple with tidyverse in a code chunk. Let’s summarize the mean and sd for our fake data by the levels of Condition:
group_fake_data <- fake_data %>%
group_by(Condition) %>%
summarize(meanRT = mean(Fake_RT), sdRT = sd(Fake_RT))
print(group_fake_data)
## # A tibble: 2 x 3
## Condition meanRT sdRT
## <chr> <dbl> <dbl>
## 1 easy 251. 2.48
## 2 hard 293. 1.41
And let’s end by running a paired t-test, since there are two data points per Participant, one for the easy condition and one for the hard condition.
#formula method for paired t-test
t.test(Fake_RT ~ Condition, data = fake_data, paired = TRUE)
##
## Paired t-test
##
## data: Fake_RT by Condition
## t = -49.49, df = 9, p-value = 2.817e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -44.37993 -40.50009
## sample estimates:
## mean of the differences
## -42.44001
#indexing method for paired t-test
t.test(fake_data$Fake_RT[fake_data$Condition == "easy"], fake_data$Fake_RT[fake_data$Condition == "hard"], paired = TRUE)
##
## Paired t-test
##
## data: fake_data$Fake_RT[fake_data$Condition == "easy"] and fake_data$Fake_RT[fake_data$Condition == "hard"]
## t = -49.49, df = 9, p-value = 2.817e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -44.37993 -40.50009
## sample estimates:
## mean of the differences
## -42.44001
You see that in both instances of running the t.test() that we got the same results. Because all of our dependent variable is contained in one column, I would recommend using the formula syntax method. The indexing method is more helpful when you have your dependent variables split across two columns.
There is a lot of flexibility to customize what the code chunks do. Probably the most important are eval = FALSE echo = FALSE and results = 'hide'. These each respectively, suppress code from running but print it, hides the code but produces the output, runs the code but hides the output. Just add these additional arguments to the end of the chunk name. Let’s prevent output from printing.
first <- 23
second <- 54
third <- first + second
However we know that a new variable was created.
third^2
## [1] 5929
Some of these options may be useful for writing up reports that don’t need to show code (maybe for a class paper or for a senior colleague/advisor) or if you don’t need to show intermediate steps that may make the output document very long.
Here are some additional resources that are very helpful. The first link is for a simple tutorial that we are going to practice in class today. The second is for a reference sheet published through R Studio. The third is a definitive guide (an e-book) to R Markdown by the developer for R Studio, Yihui Xie.
Now that you have the beginning steps to create your own document, I want you to “write up a report” on a dataset titled CSLex_subset.csv that can be found in the data folder in our Methods repository on GitHub.
Download this dataset to a local directory. Set your workind directory to that directory path and import the dataset. Then, just play around with some of functions that we learned last week to get a sense for the structure of the data set. Just remember to write some prose on what you are trying to do, then insert code chunks to do things, then execute the code within each chunk. You can turn in your final report to me as a pdf or word document.